工业X射线分析在需要保证某些零件的结构完整性的航空航天,汽车或核行业中很常见。但是,射线照相图像的解释有时很困难,可能导致两名专家在缺陷分类上不同意。本文介绍的自动缺陷识别(ADR)系统将减少分析时间,还将有助于减少对缺陷的主观解释,同时提高人类检查员的可靠性。我们的卷积神经网络(CNN)模型达到94.2 \%准确性(MAP@iou = 50 \%),当应用于汽车铝铸件数据集(GDXRAR)时,它被认为与预期的人类性能相似,超过了当前状态该数据集的艺术。在工业环境上,其推理时间少于每个DICOM图像,因此可以安装在生产设施上,不会影响交付时间。此外,还进行了对主要高参数的消融研究,以优化从75 \%映射的初始基线结果最高94.2 \%map的模型准确性。
translated by 谷歌翻译
Advances in computer vision and machine learning techniques have led to significant development in 2D and 3D human pose estimation from RGB cameras, LiDAR, and radars. However, human pose estimation from images is adversely affected by occlusion and lighting, which are common in many scenarios of interest. Radar and LiDAR technologies, on the other hand, need specialized hardware that is expensive and power-intensive. Furthermore, placing these sensors in non-public areas raises significant privacy concerns. To address these limitations, recent research has explored the use of WiFi antennas (1D sensors) for body segmentation and key-point body detection. This paper further expands on the use of the WiFi signal in combination with deep learning architectures, commonly used in computer vision, to estimate dense human pose correspondence. We developed a deep neural network that maps the phase and amplitude of WiFi signals to UV coordinates within 24 human regions. The results of the study reveal that our model can estimate the dense pose of multiple subjects, with comparable performance to image-based approaches, by utilizing WiFi signals as the only input. This paves the way for low-cost, broadly accessible, and privacy-preserving algorithms for human sensing.
translated by 谷歌翻译
The number of international benchmarking competitions is steadily increasing in various fields of machine learning (ML) research and practice. So far, however, little is known about the common practice as well as bottlenecks faced by the community in tackling the research questions posed. To shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis, we designed an international survey that was issued to all participants of challenges conducted in conjunction with the IEEE ISBI 2021 and MICCAI 2021 conferences (80 competitions in total). The survey covered participants' expertise and working environments, their chosen strategies, as well as algorithm characteristics. A median of 72% challenge participants took part in the survey. According to our results, knowledge exchange was the primary incentive (70%) for participation, while the reception of prize money played only a minor role (16%). While a median of 80 working hours was spent on method development, a large portion of participants stated that they did not have enough time for method development (32%). 25% perceived the infrastructure to be a bottleneck. Overall, 94% of all solutions were deep learning-based. Of these, 84% were based on standard architectures. 43% of the respondents reported that the data samples (e.g., images) were too large to be processed at once. This was most commonly addressed by patch-based training (69%), downsampling (37%), and solving 3D analysis tasks as a series of 2D tasks. K-fold cross-validation on the training set was performed by only 37% of the participants and only 50% of the participants performed ensembling based on multiple identical models (61%) or heterogeneous models (39%). 48% of the respondents applied postprocessing steps.
translated by 谷歌翻译
Static subword tokenization algorithms have been an essential component of recent works on language modeling. However, their static nature results in important flaws that degrade the models' downstream performance and robustness. In this work, we propose MANTa, a Module for Adaptive Neural TokenizAtion. MANTa is a differentiable tokenizer trained end-to-end with the language model. The resulting system offers a trade-off between the expressiveness of byte-level models and the speed of models trained using subword tokenization. In addition, our tokenizer is highly explainable since it produces an explicit segmentation of sequences into blocks. We evaluate our pre-trained model on several English datasets from different domains as well as on synthetic noise. We find that MANTa improves robustness to character perturbations and out-of-domain data. We then show that MANTa performs comparably to other models on the general-domain GLUE benchmark. Finally, we show that it is considerably faster than strictly byte-level models.
translated by 谷歌翻译
In Novel Class Discovery (NCD), the goal is to find new classes in an unlabeled set given a labeled set of known but different classes. While NCD has recently gained attention from the community, no framework has yet been proposed for heterogeneous tabular data, despite being a very common representation of data. In this paper, we propose TabularNCD, a new method for discovering novel classes in tabular data. We show a way to extract knowledge from already known classes to guide the discovery process of novel classes in the context of tabular data which contains heterogeneous variables. A part of this process is done by a new method for defining pseudo labels, and we follow recent findings in Multi-Task Learning to optimize a joint objective function. Our method demonstrates that NCD is not only applicable to images but also to heterogeneous tabular data.
translated by 谷歌翻译
Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstrations or natural language instructions. While these capabilities have led to widespread adoption, most LLMs are developed by resource-rich organizations and are frequently kept from the public. As a step towards democratizing this powerful technology, we present BLOOM, a 176B-parameter open-access language model designed and built thanks to a collaboration of hundreds of researchers. BLOOM is a decoder-only Transformer language model that was trained on the ROOTS corpus, a dataset comprising hundreds of sources in 46 natural and 13 programming languages (59 in total). We find that BLOOM achieves competitive performance on a wide variety of benchmarks, with stronger results after undergoing multitask prompted finetuning. To facilitate future research and applications using LLMs, we publicly release our models and code under the Responsible AI License.
translated by 谷歌翻译
Diffusion models have achieved unprecedented performance in generative modeling. The commonly-adopted formulation of the latent code of diffusion models is a sequence of gradually denoised samples, as opposed to the simpler (e.g., Gaussian) latent space of GANs, VAEs, and normalizing flows. This paper provides an alternative, Gaussian formulation of the latent space of various diffusion models, as well as an invertible DPM-Encoder that maps images into the latent space. While our formulation is purely based on the definition of diffusion models, we demonstrate several intriguing consequences. (1) Empirically, we observe that a common latent space emerges from two diffusion models trained independently on related domains. In light of this finding, we propose CycleDiffusion, which uses DPM-Encoder for unpaired image-to-image translation. Furthermore, applying CycleDiffusion to text-to-image diffusion models, we show that large-scale text-to-image diffusion models can be used as zero-shot image-to-image editors. (2) One can guide pre-trained diffusion models and GANs by controlling the latent codes in a unified, plug-and-play formulation based on energy-based models. Using the CLIP model and a face recognition model as guidance, we demonstrate that diffusion models have better coverage of low-density sub-populations and individuals than GANs. The code is publicly available at https://github.com/ChenWu98/cycle-diffusion.
translated by 谷歌翻译
生成模型(例如gan和扩散模型)以无监督的方式学习潜在的数据分布。但是,许多感兴趣的应用都需要从生成模型的输出空间的特定区域或在一系列特征范围内进行采样。为了允许在这些情况下进行有效的采样,我们提出了生成视觉提示(提示),这是一个通过合并任意现成模型的知识来对预训练的生成模型进行分配控制的框架。 Prestgen将控制定义为基于能量的模型(EBM),并通过使用可逆的神经网络近似EBM来以馈送方式进行示例图像,从而避免了推理时的优化。我们演示了提示如何使用各种出现的模型来控制多种生成模型(例如,stylegan2,stylenerf,styLenerf,bixfusion autocoder和nvae):(1)使用剪辑模型,提示可以通过文本引导的示例图像,(2)使用图像分类器,提示器可以在一组属性上脱离偏差的生成模型,并且(3)使用反图形模型,提示器可以在不同姿势中示例相同身份的图像。 (4)最后,Prestgen揭示了剪辑模型在用作控制时显示“报告偏差”,并且提示器可以以迭代方式进一步偏离此受控分布。我们的代码可在https://github.com/chenwu98/generative-visual-prompt上找到。
translated by 谷歌翻译
在本文中,我们介绍Bayesldm,这是一个用于贝叶斯纵向数据建模的系统,该系统由高级建模语言组成,具有针对复杂的多变量时间序列数据建模的特定功能,并与编译器相结合,可以生成优化的概率程序代码,以在指定模型中执行指定的推理。 Bayesldm支持贝叶斯网络模型的建模,其特定关注动态贝叶斯网络(DBN)的高效,声明性规范。 Bayesldm编译器将模型规范与可用数据和输出代码相结合,用于执行贝叶斯推断,以同时处理丢失的数据,同时处理未知模型参数。这些功能有可能通过抽象产生计算有效的概率推断代码的过程来显着加速域中的迭代建模工作流,这些迭代建模工作流程涉及复杂纵向数据的分析。我们描述了Bayesldm系统组件,评估表示和推理优化的效率,并提供了该系统在分析异质和部分观察到的移动健康数据的应用示例。
translated by 谷歌翻译
3D面部建模一直是计算机视觉和计算机图形学研究的活跃领域,从虚拟化身中的面部表达转移到合成数据生成,助长了应用。现有的3D深度学习生成模型(例如,VAE,gan)允许生成紧凑的面部表征(形状和纹理),可以在形状和外观空间中建模非线性(例如,散射效果,镜面等)。但是,他们缺乏控制微妙表达产生的能力。本文提出了一种新的3D面部生成模型,该模型可以使身份和表达不适,并提供对表达式的颗粒状控制。特别是,我们建议使用一对监督自动编码器和生成对抗网络来产生高质量的3D面,无论是外观和形状而言。实验结果是用整体表达标签或作用单元标签学到的3D面的产生结果表明,我们如何将身份和表达分离;在保留身份的同时,获得精细的表达方式。
translated by 谷歌翻译